DAFx Paper Archive - Browse all papers by Goto, M.

Query-by-Example Music Retrieval approach Based on Musical Genre shift by Chaning Instrument Volume

Katsutoshi Itoyama; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

DAFx-2009 - Como

We describe a novel Query-by-Example (QBE) approach in Music Information Retrieval, which allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis is that the musical genre shifts (changes) in relation to the volume balance of different instruments. On the basis of this hypothesis, we aim to clarify the relationship between the change of the volume balance of a query and the shift in the musical genre of retrieved similar pieces, and thus help instruct a user in generating alternative queries without choosing other pieces. Our QBE system ﬁrst separates all instrument parts from the audio signal of a piece with the help of its musical score, and then lets a user remix those parts to change acoustic features that represent musical mood of the piece. The distribution of those features is modeled by the Gaussian Mixture Model for each musical piece, and the Earth Movers Distance between mixtures of different pieces is used as the degree of their mood similarity. Experimental results showed that the shift was actually caused by the volume change of vocal, guitar, and drums.

Download

SpeakBySinging: Converting Singing Voices to Speaking Voices While Retaining Voice Timbre

Shimpei Aso; Takeshi Saitou; Masataka Goto; Katsutoshi Itoyama; Toru Takahashi; Kazunori Komatani; Tetsuya Ogata; Hiroshi Okuno

DAFx-2010 - Graz

This paper describes a singing-to-speaking synthesis system called “SpeakBySinging” that can synthesize a speaking voice from an input singing voice and the song lyrics. The system controls three acoustic features that determine the difference between speaking and singing voices: the fundamental frequency (F0), phoneme duration, and power (volume). By changing these features of a singing voice, the system synthesizes a speaking voice while retaining the timbre of the singing voice. The system first analyzes the singing voice to extract the F0 contour, the duration of each phoneme of the lyrics, and the power. These features are then converted to target values that are obtained by feeding the lyrics into a traditional text-to-speech (TTS) system. The system finally generates a speaking voice that preserves the timbre of the singing voice but has speech-like features. Experimental results show that SpeakBySinging can convert singing voices into speaking voices whose timbre is almost the same as the original singing voices.

Download

Years

Authors